Re: [GENERAL] Geometric, getting x and y co-ordinates GOING MAD!!!!! - Mailing list pgsql-general

From Stuart Rison
Subject Re: [GENERAL] Geometric, getting x and y co-ordinates GOING MAD!!!!!
Date
Msg-id v04020a0ab3829c147bf5@[128.40.242.190]
Whole thread Raw
In response to Re: [GENERAL] Geometric, getting x and y co-ordinates GOING MAD!!!!!  (selkovjr@mcs.anl.gov)
List pgsql-general
At 7:33 pm -0400 7/6/99, selkovjr@mcs.anl.gov wrote:
>On Fri, 4 Jun 1999, Stuart Rison wrote:
>
>[snipped -- a float return type drama with a happy end]
>
>
>That wasn't so much about C as it was about how postgres handles
>return values. Here's the relevant doc page:
>
>http://www.postgresql.org/docs/programmer/xfunc414.htm
>
>As far as 'bluffing', that is what perl was intentionally built for. C
>is remarkably simple, but it assumes you know not only WHAT it does
>but also HOW. It's easy to get shot if you forget about the how part
>for a moment.

That's very true, I certainly seem to do most of my bluffing in Perl.
Thanks for the doc ref, it makes -a bit- more sense now.

>> This all steemed from me trying to write a standard deviation/variance set
>> of aggregate function.  This was just because a point seemed -at the time-
>> like quite a 'cute' way of storing to floats in one base type eliminating
>> the need for arrays (the two floats being the sum of elements and the sum
>> of the elements squared stored as the x and y coords of a point
>> respectively).
>
>Although point type is a cute way of storing float pairs, it may
>become extremely inefficient in case of mega-tables. What are you
>going to do with your points? Do you build indices on them? How are the
>points distributed in 2-D? The type of distribution and order of
>points affect the performance of R-trees.

I think I may have confused you, did you think I was storing a table of
points as a method of storing a value with a confidence interval (e.g.
6.3+/-0.37) or perhaps matching x and y values for linear regression type
stats?

Saddly, my aggregate functions are far more trivial then that!!!  The point
(and there is only one) is used literally as a way of storing two floats by
the 'sfunc1' of my stddev and variance aggregates.

It's a very crude aggregate more to teach myself the basics of defining a
new aggregate then to be used extensively based on a posting by Jan Weick a
long time ago which used pg/tcl.

The idea is that you need to keep track of a minimum of three values to get
an accurate calculation of variance (and by extension standard deviation)
in a single pass algorithm (which I would argue is what an aggregate is):
- the sum of the elements in a series
- the sum of the square of the elements in a series
- the number of elements in a series

'sfunc2' could easily cope with the number of elements in a series but that
left sfunc1 to store two floats and I couldn't find a way of getting sfunc1
to cope with arrays so I just used a point instead (Jan used a Tcl list).

>Also, if you were looking to store the (mean, SD) values in one
>column, you would be better off with the whole new type. If your
>science/confession would allow you to represent random distributions
>as intervals, such as (mean - SD/2, mean + SD/2), the intervals could
>be stored as a 1-D geometric type and indexed with R-tree, with some
>caution. If that makes sense, welcome to my segment type:
>
>http://wit.mcs.anl.gov/~selkovjr/seg-type.tgz

Yes, it makes sense and I had a look your segment type work.  Although I
don't have a need for it yet, it looks wery impressive (big sigh... what a
long and steep learning 'C' curve ahead of me... can I write all me
functions in Perl and get those to be linked dynimically ;) )

>It already has some provision for the (mean, SD) syntax, but that
>needs debugging. It works great with 'lower .. upper' syntax, where
>either 'lower' or 'upper' can be omitted. Besides, it is a variable
>precision type: your query returns exactly as many significant digits
>as you have inserted. (I couldn't stand frustration it gives you when it
>returns 1.2000000 for the value you stored as 1.20. Even 1.20 and 1.2
>make a huge difference when you deal with measurements)
>
>--Gene

As a final suggestion for a TO-DO, should basic statistical function
(STDDEV, VARIANCE and perhaps MODAL) be added to the standard aggregates
set?

Best regards,

Stuart.


+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street               |
| Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
| Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
+-------------------------+--------------------------------------+

pgsql-general by date:

Previous
From: "K.T."
Date:
Subject: Re: [GENERAL] Parser error
Next
From: "Natalya S. Makushina"
Date:
Subject: Problems with ODBC and large objects